25 research outputs found

    MODBASE, a database of annotated comparative protein structure models and associated resources.

    Get PDF
    MODBASE (http://salilab.org/modbase) is a database of annotated comparative protein structure models. The models are calculated by MODPIPE, an automated modeling pipeline that relies primarily on MODELLER for fold assignment, sequence-structure alignment, model building and model assessment (http:/salilab.org/modeller). MODBASE currently contains 5,152,695 reliable models for domains in 1,593,209 unique protein sequences; only models based on statistically significant alignments and/or models assessed to have the correct fold are included. MODBASE also allows users to calculate comparative models on demand, through an interface to the MODWEB modeling server (http://salilab.org/modweb). Other resources integrated with MODBASE include databases of multiple protein structure alignments (DBAli), structurally defined ligand binding sites (LIGBASE), predicted ligand binding sites (AnnoLyze), structurally defined binary domain interfaces (PIBASE) and annotated single nucleotide polymorphisms and somatic mutations found in human proteins (LS-SNP, LS-Mut). MODBASE models are also available through the Protein Model Portal (http://www.proteinmodelportal.org/)

    MODBASE: a database of annotated comparative protein structure models and associated resources

    Get PDF
    MODBASE () is a database of annotated comparative protein structure models for all available protein sequences that can be matched to at least one known protein structure. The models are calculated by MODPIPE, an automated modeling pipeline that relies on MODELLER for fold assignment, sequence–structure alignment, model building and model assessment (). MODBASE is updated regularly to reflect the growth in protein sequence and structure databases, and improvements in the software for calculating the models. MODBASE currently contains 3 094 524 reliable models for domains in 1 094 750 out of 1 817 889 unique protein sequences in the UniProt database (July 5, 2005); only models based on statistically significant alignments and models assessed to have the correct fold despite insignificant alignments are included. MODBASE also allows users to generate comparative models for proteins of interest with the automated modeling server MODWEB (). Our other resources integrated with MODBASE include comprehensive databases of multiple protein structure alignments (DBAli, ), structurally defined ligand binding sites and structurally defined binary domain interfaces (PIBASE, ) as well as predictions of ligand binding sites, interactions between yeast proteins, and functional consequences of human nsSNPs (LS-SNP, )

    GeMMA: functional subfamily classification within superfamilies of predicted protein structural domains

    Get PDF
    GeMMA (Genome Modelling and Model Annotation) is a new approach to automatic functional subfamily classification within families and superfamilies of protein sequences. A major advantage of GeMMA is its ability to subclassify very large and diverse superfamilies with tens of thousands of members, without the need for an initial multiple sequence alignment. Its performance is shown to be comparable to the established high-performance method SCI-PHY. GeMMA follows an agglomerative clustering protocol that uses existing software for sensitive and accurate multiple sequence alignment and profile–profile comparison. The produced subfamilies are shown to be equivalent in quality whether whole protein sequences are used or just the sequences of component predicted structural domains. A faster, heuristic version of GeMMA that also uses distributed computing is shown to maintain the performance levels of the original implementation. The use of GeMMA to increase the functional annotation coverage of functionally diverse Pfam families is demonstrated. It is further shown how GeMMA clusters can help to predict the impact of experimentally determining a protein domain structure on comparative protein modelling coverage, in the context of structural genomics

    Regulatory Elements within the Prodomain of Falcipain-2, a Cysteine Protease of the Malaria Parasite Plasmodium falciparum

    Get PDF
    Falcipain-2, a papain family cysteine protease of the malaria parasite Plasmodium falciparum, plays a key role in parasite hydrolysis of hemoglobin and is a potential chemotherapeutic target. As with many proteases, falcipain-2 is synthesized as a zymogen, and the prodomain inhibits activity of the mature enzyme. To investigate the mechanism of regulation of falcipain-2 by its prodomain, we expressed constructs encoding different portions of the prodomain and tested their ability to inhibit recombinant mature falcipain-2. We identified a C-terminal segment (Leu155–Asp243) of the prodomain, including two motifs (ERFNIN and GNFD) that are conserved in cathepsin L sub-family papain family proteases, as the mediator of prodomain inhibitory activity. Circular dichroism analysis showed that the prodomain including the C-terminal segment, but not constructs lacking this segment, was rich in secondary structure, suggesting that the segment plays a crucial role in protein folding. The falcipain-2 prodomain also efficiently inhibited other papain family proteases, including cathepsin K, cathepsin L, cathepsin B, and cruzain, but it did not inhibit cathepsin C or tested proteases of other classes. A structural model of pro-falcipain-2 was constructed by homology modeling based on crystallographic structures of mature falcipain-2, procathepsin K, procathepsin L, and procaricain, offering insights into the nature of the interaction between the prodomain and mature domain of falcipain-2 as well as into the broad specificity of inhibitory activity of the falcipain-2 prodomain

    Using neural networks and evolutionary information in decoy discrimination for protein tertiary structure prediction

    Get PDF
    Background: We present a novel method of protein fold decoy discrimination using machine learning, more specifically using neural networks. Here, decoy discrimination is represented as a machine learning problem, where neural networks are used to learn the native-like features of protein structures using a set of positive and negative training examples. A set of native protein structures provides the positive training examples, while negative training examples are simulated decoy structures obtained by reversing the sequences of native structures. Various features are extracted from the training dataset of positive and negative examples and used as inputs to the neural networks.Results: Results have shown that the best performing neural network is the one that uses input information comprising of PSI-BLAST [1] profiles of residue pairs, pairwise distance and the relative solvent accessibilities of the residues. This neural network is the best among all methods tested in discriminating the native structure from a set of decoys for all decoy datasets tested. Conclusion: This method is demonstrated to be viable, and furthermore evolutionary information is successfully used in the neural networks to improve decoy discrimination

    Trends in template/fragment-free protein structure prediction

    Get PDF
    Predicting the structure of a protein from its amino acid sequence is a long-standing unsolved problem in computational biology. Its solution would be of both fundamental and practical importance as the gap between the number of known sequences and the number of experimentally solved structures widens rapidly. Currently, the most successful approaches are based on fragment/template reassembly. Lacking progress in template-free structure prediction calls for novel ideas and approaches. This article reviews trends in the development of physical and specific knowledge-based energy functions as well as sampling techniques for fragment-free structure prediction. Recent physical- and knowledge-based studies demonstrated that it is possible to sample and predict highly accurate protein structures without borrowing native fragments from known protein structures. These emerging approaches with fully flexible sampling have the potential to move the field forward

    Feature-rich distance-based terrain synthesis

    No full text
    This paper describes a novel terrain synthesis method based on distances in a weighted graph. A height field is determined by least-cost paths in a weighted graph from a set of generator nodes. The shapes of individual terrain features, such as mountains, hills, and craters, are specified by a monotonically decreasing profile describing the cross-sectional shape of a feature. The locations of features in the terrain are specified by placing the generators; secondary ridges are placed by pathing. We show the method to be robust and easy to control, even making it possible to embed images in terrain shadows. The method can produce a wide range of realistic synthetic terrains such as mountain ranges, craters, cinder cones, and hills. The ability to manually place terrain features that incorporate multiple profiles produces heterogeneous terrains that compare favorably to existing methods
    corecore